17. Regulatory landscape: HIPAA, Anonymization
Intro
Regulatory Landscape: HIPAA, Anonymization
ND320 C3 L4 15 HIPAA, Privacy, And Anonymization
Video Summary
Privacy laws vary from one country to another, but most of them are similar to Health Insurance Portability and Accountability Act (HIPAA) in the USA and General Data Protection Regulation (GDPR) in Europe.
If you want to learn more about it, there is more detail in the "EHR Data" course in this nanodegree. Consider this a bit of a recap with a focus on applying de-identification methods to DICOM data.
There are a lot to privacy laws, but something that has the greatest impact on an AI engineer are the requirements towards de-identifying the data that is coming in. De-identification is important because HIPAA Privacy rule (and GDPR) institute quite strict controls over “Protected Health Information” while at the same time HIPAA has this to say about De-identified Health Information:
De-Identified Health Information. There are no restrictions on the use or disclosure of de-identified health information. De-identified health information neither identifies nor provides a reasonable basis to identify an individual. There are two ways to de-identify information; either: (1) a formal determination by a qualified statistician; or (2) the removal of specified identifiers of the individual and of the individual’s relatives, household members, and employers is required, and is adequate only if the covered entity has no actual knowledge that the remaining information could be used to identify the individual.
Essentially, a lot of restrictions and controls are removed for data that is considered de-identified. If you read this definition carefully, you will see that HIPAA suggests two methods for de-identification. Method #2 is actually somewhat straightforward as HIPAA lists things that are considered private in 45 CFR 165.514, and we will practice with some of these in our final exercise.
Method #1 is worth a note here, though. You may wonder what statisticians have to do with privacy, but it might make sense if you remember that statisticians are what machine learning engineers used to be called before ML became widespread :) On a serious note, there is more to de-identifying data than removing unique identifiers and other things that can identify a person. Thus, age alone is hardly good enough to identify a person. But what if the age of an individual is high, you know the country where they live, and you have access to additional information such as a newspaper article that interviews the longest-living person from that country who happens to have the same age as that in your “de-identified” dataset? Surely, that would be sufficient to identify that individual. This is where statistics becomes important and a prudent approach would be to prove, with statistical guarantees, the chances of re-identifying an individual based on patterns in the dataset. This is getting very close to the concept of differential privacy and is outside the scope of this course. However, I will post some links in the final section.
When it comes to DICOM medical images, anonymization typically boils down to cleaning out the DICOM metadata tags. However, depending on the dataset, you might also want to take a closer look at the pixels. Thus, you may see text that has been burnt directly into the images (not common in CT and MR, but quite common in XRays) or facial features (as we’ve seen in some of the Slicer visualizations throughout the course). As always, a good AI engineer will inspect the data and provide input if the data has potentially identifying characteristics.
Check for Understanding
SOLUTION:
What your device should be doingSummary & Exercise Instructions
New Vocabulary
- HIPAA - Health Insurance Portability and Accountability Act - key legislation in the USA that among other things defines the concept of Protected Health Information and rules around handling it.
- GDPR - General Data Protection Regulation - European legislation that defines the principles of handling personal data, including health data.
With that, let’s move on and do a small exercise on anonymization - which would be the last exercise of our course!